Partitioning Inverted Lists for Efficient Evaluation of Set-Containment Joins in Main Memory

نویسنده

  • Dmitry Shaporenkov
چکیده

We present an algorithm for efficient processing of set-containment joins in main memory. Our algorithm uses an index structure based on inverted files. We focus on improving performance of the algorithm in a main-memory environment by utilizing the L2 CPU cache more efficiently. To achieve this, we employ some optimizations including partitioning the inverted lists and compressing the intermediate results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Comparison of Main-Memory Algorithms for Set Containment Joins

We evaluate and compare performance of signature nested loops, set partitioning, and inverted lists algorithms for set containment joins. We study running time and required storage space for the algorithms depending on such characteristics of input data sets as number of records, average cardinality of the set-valued attribute, and cardinality of the set elements domain. We outline implementati...

متن کامل

Fast, Incremental Inverted Indexing in Main Memory for Web-Scale Collections

For text retrieval systems, the assumption that all data structures reside in main memory is increasingly common. In this context, we present a novel incremental inverted indexing algorithm for web-scale collections that directly constructs compressed postings lists in memory. Designing efficient in-memory algorithms requires understanding modern processor architectures and memory hierarchies: ...

متن کامل

On the Intersection of Inverted Lists

In this paper, we discuss an efficient and effective index mechanism to support set intersections, which are important to evaluation of conjunctive queries by search engines. The main idea behind it is to decompose an inverted list associated with a word into a collection of disjoint sub-lists by arranging a set of word sequences into a trie structure. Then, by using a kind of tree encoding, we...

متن کامل

Divide-and-Conquer Algorithm for Computing Set Containment Joins

A set containment join is a join between set-valued attributes of two relations, whose join condition is speci ed using the subset ( ) operator. Set containment joins are used in a variety of database applications. In this paper, we propose a novel partitioning algorithm called Divide-and-Conquer Set Join (DCJ) for computing set containment joins eÆciently. We show that the divide-and-conquer a...

متن کامل

PIEJoin: Towards Parallel Set Containment Joins

The efficient computation of set containment joins (SCJ) over set-valued attributes is a well-studied problem with many applications in commercial and scientific fields. Nevertheless, there still exists a number of open questions: An extensive comparative evaluation is still missing, the two most recent algorithms have not yet been compared to each other, and the exact impact of item sort order...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005